An Efficient Density Conscious Subspace Clustering Method using Top-down and Bottom-up Strategies
نویسنده
چکیده
Clustering high dimensional data is an emerging research field. Most clustering technique use distance measures to build clusters. In high dimensional spaces, traditional clustering algorithms suffers from a problem called “curse of dimensionality”. Subspace clustering groups similar objects embedded in subspace of full space. Recent approaches attempt to find clusters embedded in subspace of high dimensional data. Most of the previous subspace clustering works discovers subspace clusters, by regarding the clusters as regions of higher densities. The regions are identified dense if its density exceeds the density threshold. As the cluster densities varies in different subspace cardinalities, it suffers from a problem called “density divergence problem”. We follow the basic assumptions of previous work DENCOS. It is found that varying region densities are used to overcome density divergence problem. All previous approaches are based on bottom-up method. In this paper a novel data structure is used which works on both bottom-up & top-down fashion. Performance results of this new novel data structure shows very good results and the efficiency outperforms the previous works.
منابع مشابه
Clustering in applications with multiple data sources - A mutual subspace clustering approach
In many applications, such as bioinformatics and cross-market customer relationship management, there are data from multiple sources jointly describing the same set of objects. An important data mining task is to find interesting groups of objects that form clusters in subspaces of the data sources jointly supported by those data sources. In this paper, we study a novel problem of mining mutual...
متن کاملDetecting Outlying Subspaces for High-Dimensional Data: A Heuristic Search Approach
In this paper, we identify a new task for studying the outlying degree of high-dimensional data, i.e. finding the subspaces (subset of features) in which given points are outliers, and propose a novel detection algorithm, called HighD Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heur...
متن کاملDensity-Connected Subspace Clustering for High-Dimensional Data
Several application domains such as molecular biology and geography produce a tremendous amount of data which can no longer be managed without the help of efficient and effective data mining methods. One of the primary data mining tasks is clustering. However, traditional clustering algorithms often fail to detect meaningful clusters because most real-world data sets are characterized by a high...
متن کاملDensEst: Density Estimation for Data Mining in High Dimensional Spaces
Subspace clustering and frequent itemset mining via “stepby-step” algorithms that search the subspace/pattern lattice in a top-down or bottom-up fashion do not scale to large high dimensional data bases. Recent “jump” algorithms directly choose candidate subspace regions or patterns. Their scalability and quality depend heavily on the rating of these candidates as mislead jumps incur poor resul...
متن کاملAutomatische Parameterbestimmung durch Gravitation in Subspace Clustering
Zusammenfassung Im Vergleich zu den traditionellen Clusteringverfahren ermöglicht Subspace Clustering die Suche nach Clustern in den Unterräumen (Subspaces) der Daten. Man unterscheidet zwei Hauptarten des Subspace-Clustering-Verfahrens: Top-Downund Bottom-Up-Verfahren. Die Algorithmen des Top-Down-Verfahrens verkleinern die Suchbereiche von hohen zu niedrigen Dimensionen. In dem Bottom-Up-Verf...
متن کامل